Coherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory

نویسندگان

  • Angkul Kongmunvattana
  • Nian-Feng Tzeng
چکیده

The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, called the coherence-centric logging (CCL) and recovery protocol, for home-based SDSM. Our CCL minimizes failure-free overhead by logging only data necessary for correct recovery and tolerates high disk access latency by overlapping disk accesses with coherence-induced communication existing in home-based SDSM, while our recovery reduces the recovery time by prefetching data according to the future shared memory access patterns, thus eliminating the memory miss idle penalty during the recovery process. To the best of our knowledge, this is the very first work that considers crash recovery in home-based SDSM. We have performed experiments on a cluster of eight SUN Ultra-5 workstations, comparing our CCL against traditional message logging (ML) by modifying TreadMarks, a state-of-the-art SDSM system, to support the home-based protocol and then implementing both our CCL and the ML protocols in it. The experimental results show that our CCL protocol consistently outperforms the ML protocol: Our protocol increases the execution time negligibly, by merely 1% to 6%, during failure-free execution, while the ML protocol results in the execution time overhead of 9% to 24% due to its large log size and high disk access latency. Our recovery protocol improves the crash recovery speed by 55% to 84% when compared to re-execution, and it outperforms ML-recovery by a noticeable margin, ranging from 5% to 18% under parallel applications examined.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lightweight Logging and Recovery for Distributed Shared Memory over Virtual Interface Architecture

As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, we propose a lightweight logging scheme, called remote logging, and a recovery protocol for home-based DSM. Remote logging stores coherence-related data to the volatile memory of a remote node. The logging overhead can ...

متن کامل

Logging and Recovery in Adaptive Software Distributed Shared Memory Systems

Software distributed shared memory (DSM) improves the programmability of message-passing machines and workstation clusters by providing a shared memory abstract (i.e., a coherent global address space) to programmers. As in any distributed system, however, the probability of software DSM failures increases as the system size grows. This paper presents a new, efficient logging protocol for adapti...

متن کامل

Lazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems

In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our lazy logging protocol minimizes failure-free overhead by logging only data indispensable for correct recovery, while our PCR protocol reduces the recovery time by prefetching data ...

متن کامل

Performance Optimization of Software Distributed Shared Memory Systems

Software Distributed Shared Memory Systems (DSMs, or Shared Virtual Memory) are advocated to be an ideal vehicle for parallel programming because of its combination of programmability of shared memory and scalability of distributed memory systems. The challenge in building a software DSM system is to achieve good performance over a wide range of parallel programs without requiring programmers t...

متن کامل

JIAJIA: A Software DSM System Based on a New Cache Coherence Protocol

This paper describes design and evaluation of a software distributed shared memory (DSM) system called JIAJIA. JIAJIA is a home-based software DSM system in which physical memories of multiple computers are combined to form a larger shared space. It implements the lock-based cache coherence protocol which totally eliminates directory and maintains coherence through accessing write notices kept ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999